feat(openai-agents): add @temporalio/openai-agents package#2024
Draft
feat(openai-agents): add @temporalio/openai-agents package#2024
Conversation
Integration for running OpenAI Agents SDK workflows as durable Temporal workflows. Model calls become Temporal activities; the agent loop (tools, handoffs, guardrails) runs in the workflow. Package layout: - plugin: OpenAIAgentsPlugin wires model activity + MCP providers - workflow-side: createTemporalRunner(), activityAsTool(), statelessMcpServer(), tracing utilities - activity-side: invokeModelActivity, error classification, retry-after header support, auto-heartbeating - testing namespace: FakeModel, FakeModelProvider, GeneratorFakeModel, ResponseBuilders (textResponse/toolCallResponse/handoffResponse/ multiToolCallResponse) Correctness-critical behavior: - Handoff conversion: shallow-clones user's Handoff objects to avoid mutation; preserves onHandoff, inputType, isEnabled callbacks; recursive cycle detection for cyclic handoff graphs - Error classification: inspects OpenAI SDK error shape (direct .status/.headers and legacy .response.*), honors x-should-retry + retry-after headers, derives ModelInvocationError subtypes from status code - Tool validation: tags activityAsTool-wrapped tools with a symbol marker, rejects raw functions and FunctionTools built via the bare tool() factory, recurses into handoff agents - AgentsWorkflowError wraps non-Temporal errors as cause of ApplicationFailure; TemporalFailures in the cause chain are unwrapped rather than re-wrapped Known deferrals (documented in src/index.ts): - StatefulMCPServerProvider, nexusOperationAsTool, OTel trace interceptor, workflowFailureExceptionTypes registration, testing.AgentEnvironment, testing.ResponseBuilders class Tests: 61 integration tests covering the full feature surface plus bug-reproduction coverage for every fix landed across 4 audit rounds. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…e TemporalModelStub to ActivityBackedModel, drop unused plugin options
Side-aware src layout — directory enforces what was previously a
convention. workflow/ holds the agent loop + activity proxies that run
in the workflow sandbox; worker/ holds the plugin and activity
implementations; common/ holds types referenced by both sides.
mcp.ts splits into workflow/mcp-client.ts (statelessMcpServer factory
and types) and worker/mcp-provider.ts (StatelessMCPServerProvider
class). ActivityModelInput moves to common/ since both sides type-
reference it.
Renames:
- TemporalModelStub -> ActivityBackedModel. "Stub" reads as
"test-double" to most modern callers; the class is production hot-
path code that proxies model calls to an activity. New name says
what it is.
- src/workflow.ts and src/index.ts now re-export from the
side-specific dirs; their public surface is unchanged.
Plugin cleanup:
- Drop unused modelParams field from OpenAIAgentsPluginOptions.
Constructor never read it; runtime config lives on
createTemporalRunner({modelParams}) workflow-side.
- Drop createOpenAIAgentsPlugin factory. Other SDK plugins
(AiSdkPlugin, OpenTelemetryPlugin) export class only; users do
`new OpenAIAgentsPlugin({...})` to match.
package.json exports paths updated for the new layout. Backward-
compat ./lib/* aliases remap to the new locations.
No behavior change. 60/61 tests pass; the one occasional flake is a
known dev-server resource-contention issue tracked separately.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
… serialized-model contract Workflow→activity model-call boundary now goes through explicit serialized types and field-by-field projections in both directions. Replaces ad-hoc destructure-and-strip on activity-backed-model.ts:58 plus `as any` casts on activities.ts:125. Contract (src/common/serialized-model.ts): - SerializedModelRequest / SerializedModelResponse — JSON-safe projections of upstream ModelRequest / ModelResponse. JsonValue replaces unknown on the wire. - WIRE_VERSION literal field on both, validated activity-side. Mismatch throws non-retryable WireVersionMismatch ApplicationFailure — protects rolling deploys with workflow code on a different package version than the worker. - signal excluded from SerializedModelRequest by design (AbortSignal is not serializable; Temporal cancellation provides the equivalent). Projections live with their consumers: - toSerializedModelRequest + fromSerializedModelResponse inline in workflow/activity-backed-model.ts (workflow side). - toSerializedModelResponse + fromSerializedModelRequest inline in worker/activities.ts (worker side). fromSerializedModelResponse reconstitutes Usage via new Usage(...) so .add() keeps working across multi-turn runs. fromSerializedModelRequest strips __wireVersion before passing to the upstream model so the internal protocol field doesn't leak through getResponse(). Public exports: - SerializedModelRequest, SerializedModelResponse, InvokeModelActivityInput, JsonValue, WIRE_VERSION exported from both index.ts (worker side) and workflow.ts (workflow side). - toSerializedModelRequest exported via workflow.ts (test workflow imports it). - toSerializedModelResponse exported via index.ts. Tests added in packages/test/src/test-openai-agents.ts: - Round-trip: prompt + tracing survive workflow→activity→workflow, Usage round-trips with working .add(), __wireVersion stripped from both directions. - Stripping: signal absent activity-side after additive projection. - Version mismatch: stale __wireVersion throws WireVersionMismatch. - Snapshot: shape of toSerializedModelRequest / toSerializedModelResponse output matches expected key list — fails loudly with "bump WIRE_VERSION" message if upstream adds a field that gets silently copied through. Replaces src/common/activity-model-input.ts with src/common/serialized-model.ts. No behavior change to user-facing API. 66/66 tests pass; one occasional flake under load passes in isolation. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
|
Review the following changes in direct dependencies. Learn more about Socket for GitHub.
|
Implements TemporalTracingProcessor — a TracingProcessor that maps OpenAI Agents trace/span events to OTel spans. Removes the tracingDisabled flag on the internal Runner so trace events fire. Architecture: - Workflow-side TracingProcessor (src/workflow/tracing.ts) listens for agent loop events (agent, generation, function, handoff, guardrail, custom, response, transcription, speech, speech_group, mcp_tools). - Each event creates a child OTel span via @opentelemetry/api, parented under the active OTel context (typically the workflow execution span registered by interceptors-opentelemetry). - Replay-safe: skips span creation during workflow replay via isReplaying() guard. End events implicitly skip too — the entry map is empty during replay so onSpanEnd / onTraceEnd no-op. - Idempotent registration: Symbol.for() flag on globalThis ensures the processor registers once per workflow isolate. - Uses addTraceProcessor() (not setTraceProcessors): preserves user processors registered before runner construction, no risk of wiping. Wiring: - TemporalOpenAIRunner constructor calls ensureTracingProcessorRegistered() — registers the processor and calls setTracingDisabled(false) to override the upstream NODE_ENV=test default that gates trace creation. - ActivityBackedModel.getResponse() wraps both normal and summaryOverride paths in withGenerationSpan() so generation spans fire — upstream's built-in adapters do this themselves; ours has to mirror the pattern. - Removed tracingDisabled: true from internal Runner config. Span attributes split into static (set at start: type, name, handoffs, output_type, from_agent, to_agent, server) and dynamic (set at end: tools, model, triggered, result) — avoids redundant double-set. Known limitation (documented in code): activity spans from interceptors-opentelemetry appear as siblings of generation spans, not children. Proper nesting would require pushing OTel context before the activity call. Deferred to a follow-up — current hierarchy still gives users visibility into the agent loop. Public exports: - TemporalTracingProcessor (workflow-side) - ensureTracingProcessorRegistered Adds @opentelemetry/api ^1.9.0 as regular dep (matches interceptors-opentelemetry's pattern). Tests: T1 verifies the tracing path is active and produces trace + agent + generation/response span events. Full OTel span emission verification deferred — requires in-memory exporter + interceptors- opentelemetry workflow bundle integration. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…tyOptions, drop createTemporalRunner factory + createModelActivity export, harden wire layer Two batches landed together since they share the activity-backed-model.ts file footprint. Batch A — public API cleanup: - ModelActivityParameters → ModelActivityOptions. Matches the *Options convention used elsewhere in the SDK (ActivityOptions, LocalActivityOptions, WorkerOptions). DEFAULT_MODEL_ACTIVITY_PARAMETERS → DEFAULT_MODEL_ACTIVITY_OPTIONS. File model-parameters.ts → model-activity-options.ts. - Drop createTemporalRunner factory. Pure `new` shortcut, no value-add. Users do `new TemporalOpenAIRunner(opts)` directly. - Demote createModelActivity from public exports. The plugin auto-registers the activity, so there's no reason to advertise the manual-registration bypass route. Function stays in worker/activities.ts for the plugin to use; just no longer exported from index.ts. Batch B — wire / serialization layer hardening: - Asymmetric exports made consistent. to* projections public (part of the wire contract); from* projections private (implementation detail). - Cast comments at every type-assertion boundary now document why each is safe — input/prompt/tracing as JsonValue, Usage data as JsonValue, AgentOutputItem[] as JsonValue[]. No `as any` regressions. - Usage class reconstruction comment clarified: Usage is the only class instance needing reconstitution post-wire, since AgentOutputItem variants are all Zod-inferred plain objects. - providerData JSDoc expanded with explicit coercion warning (Date → ISO string, Map/Set/class instances flattened by Temporal's JSON codec). - Inline rationale at activities.ts version-check site (no longer references CLEANUP.md, which is a local-only artifact). - Removed duplicate `signal` exclusion comment from activity-backed-model.ts. Canonical comment lives in serialized-model.ts. - tracing field commentary updated to accurately describe ModelTracing as an enablement flag (not a context carrier); span-context propagation belongs in TemporalTracingProcessor. - Two new drift-detection tests in test-openai-agents.ts. JSON round-trip on populated SerializedModelRequest / SerializedModelResponse with deepEqual — fails loudly if upstream adds a non-serializable field. No behavior change. 69/69 tests pass. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…ety fixes
Two batches landed together since they share test-file footprint.
Batch C — convert-agent.ts hardening (7 items):
- Move unwrapTemporalFailure from convert-agent.ts to common/errors.ts.
It's an error utility, not agent conversion logic; co-locating with
AgentsWorkflowError gives one place for error helpers.
- Replace `'default'` model-name fallback with explicit
AgentsWorkflowError thrown at convert time. Users now get a clear
"no model declared" error at workflow start instead of an opaque
activity failure on first model call.
- Introduce getAgentInternals() helper in src/workflow/agent-internals.ts
centralizing the unsafe access to upstream Agent's model/handoffs/tools
fields. Future upstream type changes touch one file.
- Drop `as Model` cast on agent.clone({ model: activityBackedModel }).
ActivityBackedModel implements Model and is structurally compatible —
TS accepts it without the cast.
- Expand setAgent comment to explain why the original (pre-clone) agent
is bound to the summary provider, not the cloned wrapper.
- Add CLEANUP-6 test asserting the Object.create-based Handoff clone
preserves all upstream-documented fields and prototype identity.
Future upstream Handoff additions trip this test.
- Top-of-file contract block in convert-agent.ts pinning the implicit
upstream contracts (Agent.clone, Handoff.onInvokeHandoff,
Agent.handoffs) to @openai/agents-core ~0.3.0 — checklist for dep
upgrades.
Batch D — tracing replay-safety fixes:
- Symmetric replay gating: onTraceEnd and onSpanEnd now have the same
isReplaying() guard as their start counterparts. Matches Python's
uniform gating; no longer relies on Map-empty-after-replay as the
implicit gate.
- Workflow-scoped spans Map: Map<workflowId, Map<spanId, SpanEntry>>
instead of a flat Map shared across all workflows in the V8 isolate.
Cross-workflow leaks impossible. Outer entry is auto-cleaned when
the inner Map empties.
- Document that deterministic trace/span IDs come from the
crypto.randomUUID polyfill in load-polyfills.ts (delegates to
workflow.uuid4(), which is per-workflow seeded). No code change for
this — verified via the new replay-safety test.
- Add T2 replay-safety test using maxCachedWorkflows: 0 to force
workflow eviction after every task. Asserts replay actually occurred
AND the workflow completes without NondeterminismError. Proves the
trace processor's replay gating + the polyfill-backed deterministic
IDs hold up under forced replay.
- Delete getWorkflowTracingConfig() — was a dead function that always
returned 'enabled'. Removed from public exports.
- Expand JSDoc on ensureTracingProcessorRegistered documenting the
global side-effect (mutates upstream's processor list) and the
per-isolate singleton behavior so users aren't surprised.
71/71 tests pass. ESLint + Prettier clean. Both batches reviewed by
code-auditor and comment-auditor and all findings applied.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…(Python parity)
Single combined batch — runner.ts cleanups and the validateTools rewrite
share the same code path. 8 runner.ts items + tool validation relax.
Runner cleanups:
- Remove runStreamed method entirely. Calling it now produces a clean
TypeError ("not a function") instead of a custom throw. We don't
extend Runner so we have no obligation to expose it.
- Drop AgentsWorkflowError class. The string 'AgentsWorkflowError' on
ApplicationFailure.type already serves as the marker; the wrapper
class added duplication and an extra hop on the cause chain. Now
errors flow directly: ApplicationFailure(type='AgentsWorkflowError',
cause: originalError). BREAKING: the class is no longer exported.
Users doing `instanceof AgentsWorkflowError` should switch to
ApplicationFailure type-tag checks.
- Tighten TemporalRunOptions.runConfig.model to `string` only.
Workflow can't serialize Model objects across the activity boundary,
so accepting `string | Model` was a typed lie. Runtime guard
deleted; TypeScript catches misuse at the call site.
- Forward all upstream RunConfig fields explicitly to internalRunner.
Previously dropped silently: handoffInputFilter, inputGuardrails,
outputGuardrails, modelSettings, tracingDisabled,
traceIncludeSensitiveData, workflowName, traceId, groupId,
traceMetadata, conversationId, session, sessionInputCallback,
callModelInputFilter, tracing. Each new field has a JSDoc comment
noting Temporal-specific caveats (e.g. guardrails must be
deterministic, signal omitted in favor of CancellationScope).
Convert-agent + tool validation:
- Fold validateTools into convertAgent. Single graph traversal now
handles validation, model conversion, and Handoff cloning — was
three full walks. ~40 lines of duplication removed.
- Drop TEMPORAL_ACTIVITY_TOOL_MARKER gate. Upstream tool() factory
products are now accepted inline in workflow context — Python
parity. Raw functions still rejected. The marker constant stays
for debugging / future introspection.
- Tool-type allowlist comment in convertAgent listing every accepted
upstream tool variant alphabetically. Note that ApplyPatch /
Computer / Shell tools pass validation but will fail at runtime in
the sandbox (require local I/O).
- getAgentInternals helper now used for tools as well as
model/handoffs — single source of truth for upstream Agent
property access.
- Tighten convert-agent.ts error message for non-string Model. Points
users at runConfig.model: string as the override path.
Tests:
- E3/F20 inverted from rejection-test to inline-success-test. A
deterministic tool() product runs in the workflow without
activityAsTool; verifies the output round-trips.
- C3/F27 simplified to verify TypeError surfaces as
WorkflowFailedError. No specific message check since the method
doesn't exist at all now.
- C1/F7 updated: assertion on causeName is now 'Error' (the original
error directly on cause), not 'AgentsWorkflowError' (which would
imply a wrapper).
- H2 raw-function-rejection tests still pass; message strings
updated to match the tightened error wording.
Migration note: AgentsWorkflowError class removed from public API.
Users identifying these errors should switch from
`e instanceof AgentsWorkflowError` to checking
`(e as ApplicationFailure).type === 'AgentsWorkflowError'` on the
serialized failure.
71/71 tests pass. ESLint + Prettier clean. Both audits PASS — only
note was the breaking class removal, intentional per CLEANUP.md spec.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…ugin/options polish Two batches landed in parallel — they touch disjoint primary files but share test-file footprint, so committing as one. Batch F — tracing remaining gaps + plugin modelParams + concurrent test: - New TemporalOpenAIRunnerOptions extends ModelActivityOptions with startSpansInReplay?: boolean. Plumbed through to TemporalTracingProcessor via ensureTracingProcessorRegistered, so callers can opt into emitting spans during replay for debugging replay-divergence issues. Default false. - TemporalTracingProcessor's four event methods now gate via a single shouldSkip() helper that respects startSpansInReplay. Previously used four inline isReplaying() checks. - Activity-span nesting comment in tracing.ts updated. Previously said "deferred to a follow-up"; now correctly explains that activity spans nest under generation spans when @temporalio/interceptors- opentelemetry is configured. Investigated worker-side TracingProcessor and concluded it isn't needed — OpenAI Agents SDK trace events fire workflow-side only; the activity span just needs OTel context propagation, which interceptors-opentelemetry already provides. - OpenAIAgentsPluginOptions accepts modelParams?: ModelActivityOptions as a config-surface field. The plugin runs worker-side and can't inject config into the V8 workflow sandbox, so users must still pass modelParams to new TemporalOpenAIRunner(options) in workflow code. JSDoc explains this honestly; future versions may auto-propagate via workflow interceptors. - T3 test verifies two concurrent workflows on one worker have isolated trace IDs (no cross-pollination), exercising the workflow-scoped Map added in 2d7949a. Batch H — model-activity-options enhancements + testing.ts consolidation: - model-activity-options.ts: cancellationType defaults to ActivityCancellationType.TRY_CANCEL so cancellations reach the activity cooperatively. JSDoc on every public field. versioningIntent was investigated and dropped — upstream Worker Versioning API is deprecated; using the Worker Deployment API is the path forward whenever that becomes a need. Adding a deprecated option to a new surface would create immediate tech debt. - testing.ts: FakeModel and the former GeneratorFakeModel collapse into a single FakeModel that takes ModelResponse[] | Generator <ModelResponse>. FakeModelProvider takes the same union with a factory variant for the generator side. The deprecated aliases GeneratorFakeModel and GeneratorFakeModelProvider were removed entirely (BREAKING for tests, low surface — only used inside our own test suite). - ResponseBuilders const object exposes text / toolCall / handoff / multiToolCall — namespace-style access in addition to the existing flat exports. Mirrors Python's TestModel grouping. - src/testing.ts barrel re-exports from worker/testing.ts so consumers can import from @temporalio/openai-agents/lib/testing without diving into worker/. The lib/testing path is already used by stubs in packages/test. - Activity-backed model wires cancellationType through to both proxyActivities and proxyLocalActivities options. User overrides win since DEFAULT_MODEL_ACTIVITY_OPTIONS is spread first. Public API removals: - GeneratorFakeModel / GeneratorFakeModelProvider (test utility aliases, not used by production code). 72/72 tests pass. ESLint + Prettier clean. Both audits PASS — code auditor noted T3 proves trace-ID disjointness via UUID uniqueness rather than directly verifying Map scoping; comment auditor flagged 3 stale references in DEFERRED.md / index.ts module-level JSDoc, all fixed. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Three previously-accepted-as-trade-offs cast patterns at the wire
boundary, replaced with field-by-field projections + indexed access
types. Closes the last open CLEANUP.md item.
worker/activities.ts:
- toSerializedModelResponse: replaced `as unknown as JsonValue` on
Usage with explicit field-by-field projection of all 6 top-level
Usage data fields (requests, inputTokens, outputTokens,
totalTokens, inputTokensDetails, outputTokensDetails) plus
conditional spread of requestUsageEntries (per-entry RequestUsage
projection of all 6 fields). Conditional spread avoids serializing
`requestUsageEntries: undefined`, which would fail JSON round-trip.
- fromSerializedModelRequest: replaced trailing `as ModelRequest`
with field-level casts using indexed access types
(`ModelRequest['prompt']`, `ModelRequest['tracing']`). The named
Prompt and ModelTracing types aren't re-exported from
@openai/agents-core, but indexed access derives them directly from
the ModelRequest interface, automatically tracking upstream
changes. TS now structurally validates the overall shape — adding
a required field upstream produces a compile error here.
workflow/activity-backed-model.ts:
- fromSerializedModelResponse: replaced `new Usage(wire.usage as
Record<string, unknown>)` with explicit shape narrowing followed
by `new Usage({...})` with named fields and `?? 0` defaults.
requestUsageEntries reconstructed via `new RequestUsage(...)`
per-entry — preserves prototype chain so any downstream
`instanceof RequestUsage` checks work post-reconstruction.
- Removed trailing `as ModelResponse` cast — return type-checks
structurally with field-level `output` cast.
The only remaining double-cast (`as unknown as JsonValue[]` on
`output`) is genuinely necessary: AgentOutputItem variants contain
`providerData?: Record<string, any>`, and `any` does not extend
JsonValue, blocking single-step narrowing of the discriminated
union. The intermediate `unknown` is the idiomatic escape. Same
pattern used symmetrically going the other direction. Comment in
both directions explains the rationale.
72/72 tests pass (2 occasional flakes pass in isolation). ESLint +
Prettier clean. Both audits PASS.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…tivity boundary
Mirrors Python's OpenAIAgentsContextPropagationInterceptor. Closes
the gap where activity-side agent spans (e.g. inside MCP tool calls)
started fresh disconnected trace trees rather than nesting under the
workflow's run.
What was missing: the existing TemporalTracingProcessor bridges
OpenAI Agents trace events to OTel spans within ONE execution
context, but agent traceId/spanId did not propagate when an activity
was scheduled from a workflow. The README claim that
@temporalio/interceptors-opentelemetry handled this was wrong —
that package propagates OTel context, not agent SDK trace IDs.
Mechanism (3 new files):
- common/trace-header.ts — wire shape AgentsSpanHeader (traceId,
spanId, traceName), header key `__openai_span` matching Python,
inject/extract helpers using defaultPayloadConverter.
- workflow/trace-interceptor.ts — OpenAIAgentsTraceOutboundInterceptor
hooks scheduleActivity, scheduleLocalActivity,
startChildWorkflowExecution, signalWorkflow. Reads
getCurrentTrace()/getCurrentSpan() from the agent SDK; when an
agent trace is active, injects the header. Activity scheduling
paths are additionally wrapped in withCustomSpan
('temporal:startActivity:<type>') so the temporal call appears in
the agent trace tree. Child-workflow and signal paths inject
headers only — no span (matches Python).
- worker/trace-interceptor.ts —
OpenAIAgentsTraceActivityInboundInterceptor.execute extracts the
header and restores agent context activity-side. Two paths:
- startTraces=false (default): sets the agent SDK's
AsyncLocalStorage directly via the well-known symbol
Symbol.for('openai.agents.core.asyncLocalStorage'). No
processor events fire — pure ID propagation. The
getCurrentTrace() call ahead of the symbol access is
deliberate: it forces upstream's lazy ALS init so the symbol
actually exists on globalThis when we read it.
- startTraces=true: uses withTrace + setCurrentSpan, fires
processor events on activity-side processors. Span lifecycle
balanced via try/finally so span.end() always pairs span.start().
Plugin wiring (worker/plugin.ts):
- New OpenAIAgentsPluginOptions.traceInterceptor option (forwards
to the activity-side interceptor).
- workerInterceptors.workflowModules registers
workflow/trace-interceptor.ts.
- workerInterceptors.activity registers the inbound interceptor.
- package.json adds ./workflow-interceptor and
./lib/workflow/trace-interceptor export paths so users with
pre-built workflow bundles can reference the module by name in
workflowInterceptorModules.
Stale claim removed: the comment in workflow/tracing.ts that
attributed agent trace propagation to interceptors-opentelemetry
now correctly cites the new interceptor pair. interceptors-
opentelemetry is still cited for OTel span nesting, which it does
handle.
T4 test: workflow runs an agent through a TraceCaptureModelProvider
that calls getCurrentTrace() inside the activity and returns the
captured traceId in the response text. Test asserts
activityTraceId === workflowTraceId — proves end-to-end propagation
across the wf→header→activity→ALS→model chain.
Two bugs caught during implementation:
1. Test infrastructure pre-builds the workflow bundle before
plugins are instantiated, so the plugin's workflowModules never
reach the bundler. Test setup now includes the interceptor via
workflowInterceptorModules. Production users either get the
plugin's auto-bundling path or must add the module to their
own bundle config — same pattern as any workflow interceptor.
2. The agent SDK's AsyncLocalStorage is created lazily on first
call to a tracing function. Activity-side context restoration
has to call getCurrentTrace() once before reading the
well-known symbol, otherwise the ALS doesn't exist yet.
72/72 tests pass (2 occasional flakes pass in isolation). ESLint +
Prettier clean. Both audits PASS — code-auditor noted one DESIGN
finding (orphaned span in startTraces=true path) which is fixed in
this commit; comment-auditor noted 2 NITs (missing JSDoc) also
fixed.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…nt/workflow inbound interceptors, deterministic IDs, addTemporalSpans toggle
Closes the parity gap with Python's `_trace_interceptor.py`. Prior
commits shipped only workflow-outbound + activity-inbound — the
client-side and workflow-inbound halves were missing. Without them,
agent trace context stopped at the workflow boundary in either
direction (a client running an agent and starting a workflow couldn't
see the workflow's spans nest under the client's trace; a workflow
receiving a signal couldn't see the signal's spans nest under the
sender's trace).
This commit adds the missing halves and brings the full propagation
chain to parity, plus deterministic ID generation and the
add_temporal_spans toggle.
Parity table — every method in Python's `_trace_interceptor.py`
maps to a TS equivalent (verified end-to-end by audit, no missing
rows). 32 methods across 5 classes plus 1 module-level helper.
Code:
- New `src/client/trace-interceptor.ts` —
OpenAIAgentsTraceClientInterceptor with 6 methods
(startWithDetails, signal, query, startUpdate, signalWithStart,
startUpdateWithStart). Reads getCurrentTrace/getCurrentSpan from
agent SDK, injects __openai_span header into client outbound
calls, optionally wraps each in a temporal:* custom span.
startUpdateWithStart correctly injects into both
workflowStartHeaders and updateHeaders — TS-only extension since
Python's client doesn't expose this op.
- `src/workflow/trace-interceptor.ts` extended with
OpenAIAgentsTraceInboundInterceptor implementing
WorkflowInboundCallsInterceptor. Five hooks: execute (with
temporal:executeWorkflow span), handleSignal (with span +
signalName data), handleQuery (span only — queries don't carry
trace context, matches Python), validateUpdate (sync ALS-run
context restore — closes the parity gap with Python's
handle_update_validator), handleUpdate (context restore, no
span). Outbound now also handles startChildWorkflowExecution
and signalWorkflow with the .finally(() => span.end()) pattern
on the result promise, mirroring Python's done callback.
- `src/worker/trace-interceptor.ts` updated to wrap activity
execution in a temporal:executeActivity span with activityId/
activityType metadata. Both inbound interceptors share
withRestoredAgentsTraceContext from a new common helper.
- `src/common/trace-context.ts` (NEW) — shared trace context
restoration helpers. Two variants per direction: async
(withTrace + setCurrentSpan, fires processor events for activity
inbound and workflow inbound async paths) and sync
(AsyncLocalStorage.run directly, no events, used by validateUpdate
which must return synchronously). The withTraceEvents path closes
spans via try/finally to balance start/end pairs (DESIGN finding
from prior batch fixed here too).
- `src/common/trace-header.ts` — moved currentAgentsSpanHeader()
here from the per-side interceptor files; both client and
workflow interceptors now import it. Eliminates the prior
duplication.
- `src/workflow/tracing.ts` —
- installDeterministicTraceIds() wraps the global TraceProvider's
createTrace and createSpan to inject IDs from workflow.uuid4()
when the caller doesn't supply one. Replaces implicit reliance
on the crypto.randomUUID polyfill (which stays as belt-and-
suspenders). Mirrors Python's gen_trace_id/gen_span_id override.
- addTemporalSpans + startTraces config plumbed through globalThis
Symbol so workflow interceptors (which can't accept runtime
options through workflowModules) can read it.
- First-call-wins runtime warning when ensureTracingProcessorRegistered
is called again with conflicting options.
- `src/worker/plugin.ts` registers the new client interceptor via
clientInterceptors.workflow alongside the existing activity +
workflow modules wiring.
Tests (test-openai-agents.ts):
- All 8 tracing tests now use maxCachedWorkflows: 0 to force replay
every workflow task. Previously only T2 did. T1, T3, T4 retrofitted.
- T5: client → workflow propagation. Wraps executeWorkflow in
withTrace, asserts workflow's getCurrentTrace().traceId equals
client's.
- T6: signal carrying trace context. Parent in withTrace signals
child; child's signal handler captures traceId, asserts it
matches parent's.
- T7: child workflow propagation. executeChild from a parent in
withTrace; child returns getCurrentTrace().traceId, assert match.
- T8: deterministic IDs and timestamps. Captures trace/span IDs
and span.startedAt under forced replay; asserts ID format
(trace_<32hex>, span_<24hex>) and ISO 8601 timestamps. Reaching
the assertions without NondeterminismError implicitly proves
determinism end-to-end.
n/a items (with rationale documented in code):
- timeIso() override: upstream's only clock source is
new Date().toISOString() (verified against v0.3.9). The Temporal
V8 sandbox replaces Date with a deterministic clock, so this is
automatically replay-safe. T8's NondeterminismError check guards
the assumption.
- ActivitySpanData: TS uses CustomSpanData with inline data fields
for activity metadata; no separate typed class needed.
77/77 tests pass. ESLint + Prettier clean. Both audits PASS — code
auditor independently produced a parity table and confirmed no
missing rows. Comment auditor verified all interceptor classes have
JSDoc with Python mirror references.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…ensive E2E test - Auto-wire ActivityTracingProcessor on activity workers so temporal:executeActivity spans reach OTel. - Derive OTel trace/span IDs deterministically from the agent SDK trace context via a SeedableIdGenerator; workflow and activity sides unify into a single OTel trace tree. - Move outbound trace-header capture inside the maybeTemporalSpan callback so the propagated spanId is the temporal:startActivity:* span itself, not its parent generation span. - Flip addTemporalSpans default from true to false on plugin and runner; add JSDoc warning on plugin's traceInterceptor about the workflow-side config requirement. - Drop the startSpansInReplay option and shouldSkip() guards from the workflow tracing processor; replay safety is automatic via the deterministic ID generator. - Add a comprehensive end-to-end tracing test: one workflow exercising basic agent, inline tool(), activityAsTool, multi-turn loop, handoff, stateless MCP, input/output guardrails, child workflow with sub-agent, signals, queries, and updates with/without validators. Run with both addTemporalSpans=true and false. Worker restart between phases. Exact-hierarchy match (timing-dependent handler spans filtered) and exact span-count assertions. - Package-wide comment cleanup: remove internal codes, cross-SDK parity references, time-rotting history, and stale symbol references. - Remove dev-process notes (CLEANUP.md, DEFERRED.md, MIGRATION.md) that shouldn't ship inside the published package.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Adds
@temporalio/openai-agents, a Temporal plugin that runs OpenAI Agents SDK workflows as durable Temporal workflows. Model invocations become Temporal activities; the agent loop (tools, handoffs, guardrails) runs deterministically in the workflow sandbox.User-facing API:
Plus
activityAsTool,statelessMcpServer,StatelessMCPServerProvider, tracing utilities, and a publictestingnamespace.Design
Design proposal lives at
openai-agents-proposal-v2.mdin the repo root of the branch's working tree (not committed). High-level: the runner recursively converts the agent graph, swapping each agent's model for aTemporalModelStub. The stub dispatchesgetResponsecalls toinvokeModelActivity, which runs on the activity worker where the realModelProviderlives. The agent loop stays in the workflow, so tool calls and handoffs are durable.Correctness-critical behavior
Handoffobjects (does not mutate); preservesonHandoff,inputType,isEnabled; recursive cycle detection for cyclic handoff graphs.error.status/error.headersand legacyerror.response.*); honorsx-should-retryandretry-afterheaders; derivesModelInvocationErrorsubtypes (.RateLimit,.Authentication,.BadRequest,.ServerError,.Timeout,.Conflict).activityAsTool-produced tools carry a symbol marker; the runner rejects raw functions and FunctionTools built via the baretool()factory, recurses into handoff agents' tools.TemporalFailures buried in the cause chain are unwrapped rather than re-wrapped.Headers,ReadableStream,structuredClone,crypto.randomUUID(deterministic viauuid4()),EventTarget/Event/CustomEvent(workflow-safe, isolated listener errors).What's deferred
See
packages/openai-agents/DEFERRED.md:StatefulMCPServerProvidernexusOperationAsTool(TS SDK doesn't exposeexecuteNexusOperationyet)workflowFailureExceptionTypesregistration (TS SDK doesn't support this concept)testing.AgentEnvironment,testing.ResponseBuildersclassTests
61 integration tests in
packages/test/src/test-openai-agents.tscovering the full feature surface plus bug-reproduction coverage for every audit finding fixed during development (4 audit rounds, ~65 findings addressed). Tests useFakeModelProviderandGeneratorFakeModelProviderfor determinism; optional remote tests against real OpenAI API can be gated behind an env var in a follow-up.Known test flakiness: Full-suite runs occasionally show 1-5 test failures under sequential load with the dev server ("service rate limit exceeded"); the same tests pass reliably when run in isolation via
--match. This is infrastructure flake, not correctness. Tracking for follow-up (potential mitigations:avaretry config,WorkflowEnvironment.createTimeSkipping, splitting the test file).Test plan
pnpm --filter @temporalio/openai-agents exec tsc --buildpnpm --filter @temporalio/test run build:tspnpm --filter @temporalio/test exec ava ./lib/test-openai-agents.jspnpm --filter @temporalio/openai-agents exec eslint src/pnpm --filter @temporalio/openai-agents exec prettier --check src/🤖 Generated with Claude Code